{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Inferential statistics (dataset 2)\n", "Often, we are not only interested in describing our data with descriptive statistics like the mean and standard deviation, but want to know whether two or more sets of measurements are likely to come from the same underlying distribution. We want to draw inferences from the data. This is what inferential statistics is about.\n", "\n", "To learn how to do this in python, let's use some example data:\n", "\n", "To test whether a new wonder drug increases the eye sight, Linda and Anabel ran the following experiment with student subjects:\n", "\n", "Experimental subjects were injected a saline solution containing 1nM of the wonder drug. Control subjects were injected saline without the drug. \n", "The drug is only effective for an hour or so. To assess the effect of the drug, eye sight was scored by testing the subjects' ability to read small text within one hour of drug injection.\n", "\n", "However, Linda and Anabel used two different experimental designs:\n", "1. Linda tested each student on ten consecutive days and measured the performance only after the experiment. She used 50 control (saline only) and 50 experimental subjects (saline+drug) - so 100 subjects in total.\n", "2. Anabel only performed a single test per subject, but she measured the eye sight 30 minutes before and 30 minutes after the treatment. She tested 60 different subjects.\n", "\n", "Our task is now to decide whether the wonder drug really improves eye sight as tested in these two sets of experiments.\n", "\n", "Let's look at the second dataset." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import scipy\n", "\n", "plt.style.use('ncb.mplstyle')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | animal | \n", "score_before | \n", "score_after | \n", "treatment | \n", "
---|---|---|---|---|
0 | \n", "0 | \n", "14.248691 | \n", "9.776487 | \n", "0 | \n", "
1 | \n", "1 | \n", "9.943656 | \n", "8.854063 | \n", "0 | \n", "
2 | \n", "2 | \n", "12.730815 | \n", "6.396923 | \n", "0 | \n", "
3 | \n", "3 | \n", "14.489624 | \n", "9.477586 | \n", "0 | \n", "
4 | \n", "4 | \n", "11.638078 | \n", "10.501259 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
95 | \n", "95 | \n", "13.320677 | \n", "16.738985 | \n", "1 | \n", "
96 | \n", "96 | \n", "14.809317 | \n", "18.222113 | \n", "1 | \n", "
97 | \n", "97 | \n", "12.318100 | \n", "12.745123 | \n", "1 | \n", "
98 | \n", "98 | \n", "12.204639 | \n", "16.840564 | \n", "1 | \n", "
99 | \n", "99 | \n", "12.621903 | \n", "18.088884 | \n", "1 | \n", "
100 rows × 4 columns
\n", "